Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features
نویسنده
چکیده
自然言語処理では個々の問題を分類問題として定式化し,帰納学習の手法を利用して,その問題を解決す るというアプローチが大きな成功をおさめている.しかしこのアプローチには帰納学習で必要とされる訓 練データを用意しなければならないという大きな問題がある.この問題に対して,近年,少量のラベル付き 訓練データから得られる分類規則の精度を,大量のラベルなし訓練データによって高めてゆく seed 型の学 習が散見される.ここではその中心的な手法である Co-training を語義判別規則に適用することを試みる. ただし Co-training では独立な 2組の素性集合を設定する必要がある.現実的にはこの独立性の条件が厳し いため,得られる規則の精度が頭打ちになってゆく.本論文ではこの問題を回避するために,追加事例の選 択に素性間の共起性を考慮することで Co-training の手法を改良する.実験では 3つの語義選択問題につい て本手法を適用した.結果,通常の Co-training を適用する以上の精度の向上が見られた.
منابع مشابه
Latent Semantic Word Sense Disambiguation Using Global Co-occurrence Information
In this paper, I propose a novel word sense disambiguation method based on the global co-occurrence information using NMF. When I calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. Therefore, the NMF algorithm sometimes does not converge to desired solutions. To obtain a large number of co-occurrence relatio...
متن کاملSelf-training and co-training in biomedical word sense disambiguation
Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learnin...
متن کاملCo-training and Self-training for Word Sense Disambiguation
This paper investigates the application of cotraining and self-training to word sense disambiguation. Optimal and empirical parameter selection methods for co-training and self-training are investigated, with various degrees of error reduction. A new method that combines cotraining with majority voting is introduced, with the effect of smoothing the bootstrapping learning curves, and improving ...
متن کاملWord Sense Disambiguation by Web mining for word co-occurrence probabilities
This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill’s rule-based part-of-speech tagger. Head words are represented as feature vec...
متن کاملWord Sense Disambiguation Using Vectors of Co-occurrence Information
This paper reports on the word sense disambiguation of Korean noun by using co-occurrence information in context. For a given noun, its local contextual word distribution is not enough to express their semantic characteristics for noun sense disambiguation. This paper proposes a cluster-based sense as a base vector. Contextual noise is removed by a term weighting method, and hypernyms of remain...
متن کامل